Nonasymptotic bounds on the mean square error 
for MCMC estimates via renewal techniques 
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Abstract The Nummellin's split chain construction allows to decompose a Markov 
chain Monte Carlo (MCMC) trajectory into i.i.d. "excursions". Regenerative MCMC 
algorithms based on this technique use a random number of samples. They have 
been proposed as a promising alternative to usual fixed length simulation [24, 32, 
13]. In this note we derive nonasymptotic bounds on the mean square error (MSE) 
of regenerative MCMC estimates via techniques of renewal theory and sequential 
statistics. These results are applied to construct confidence intervals. We then focus 
on two cases of particular interest: chains satisfying the Doeblin condition and a ge- 
ometric drift condition. Available explicit nonasymptotic results are compared for 
different schemes of MCMC simulation. 



1 Introduction 

Consider a typical MCMC setting, where tt is a probability distribution on 3^ and 
/ : ^ — > M a Borel measurable function. The objective is to compute (estimate) the 
integral 
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e:^Kf^j^K{Ax)f{x). (1) 

Assume that direct simulation from n is intractable. Therefore one uses an ergodic 
Markov chain with transition kernel P and stationary distribution K to sample ap- 
proximately from K. Numerous computational problems from Bayesian inference, 
statistical physics or combinatorial enumeration fit into this setting. We refer to 
[31, 29, 9] for theory and applications of MCMC. 

Let {Xn)n>Q be the Markov chain in question. Typically one discards an initial 
part of the trajectory (called burn-in, say of length f) to reduce bias, one simulates 
the chain for n further steps and one approximates 9 with an ergodic average; 

0r^^ = -'i:V(^,)- (2) 

The fixed numbers t and « are the parameters of the algorithm. Asymptotic validity 
of (2) is ensured by a Strong Law of Large Numbers and a Central Limit Theorem 
(CLT). Under appropriate regularity conditions [31, 4], it holds that 

V^(0fi^-0)^^(O,(j2(/)), (3) 

where O^sif) is called the asymptotic variance. In contrast with the asymptotic the- 
ory, explicit nonasymptotic error bounds for 9^^ appear to be very difficult to derive 
in practically meaningful problems. 

Regenerative simulation offers a way to get around some of the difficulties. The 
split chain construction introduced in [2, 27] (to be described in Section 2) allows 
for partitioning the trajectory {X„)„>o into i.i.d. random tours (excursions) between 
consecutive regeneration times Tq, Ti , 72, Random variables 

Tk-l 

L m) (4) 

are i.i.d. for k= 1,2, .. . (So(/) can have a different distribution). Mykland et al. in 
[24] suggested a practically relevant recipe for identifying Tq, , . . . in simula- 
tions (formula (2) in Section 2). This resolves the burn-in problem since one can 
just ignore the part until the first regeneration Tq. One can also stop the simulation 
at a regeneration time, say T,-, and simulate r full i.i.d. tours, c.f. Section 4 of [32]. 
Thus one estimates 9 by 

:= lV(^,) = (5) 

where Ti^^Tj^ — Ti^_i ^ ^^ (1) are the lengths of excursions. The number of tours r 
is fixed and the total simulation effort is random. Since 9^'^^ involves i.i.d. random 
variables, classical tools seem to be sufficient to analyse its behaviour Asymptoti- 
cally, (5) is equivalent to (2) because 
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7^(0^8- 0)->^(O,(T,l(/)), (r-^c), 

where m E Ti . Now rm = E (r^ — To), the expected length of the trajectory, plays 
the role of n. However, our attempt at nonasymptotic analysis in Subsection 3.1 
reveals unexpected difficulties: our bounds involve m in the denominator and in 
most practically relevant situations m is unknown. 

If m is known then instead of (5) one can use an unbiased estimator 

0-b -.^^Y. E,{f), (6) 

Quite unexpectedly, (6) is not equivalent to (5), even in a weak asymptotic sense. 
The standard CLT for i.i.d. summands yields 

^A^(0^''-0)^^(O,CT4b(/)), (r^^), 

where (J^„^,{f) := VarSi (/) /m is in general different from (J^^if). 

We introduce a new regenerative-sequential simulation scheme, for which better 
nonasymptotic results can be derived. Namely, we fix n and define 

R{n) := min{r : 7). > To + n}. 

The estimator is defined as 

We thus generate a random number of tours as well as a random number of samples. 
Our approach is based on inequalities for the mean square error, 

MSE:=E(0-0)2. 

Bounds on the MSE can be used to construct fixed precision confidence intervals. 
The goal is to obtain an estimator Q which satisfies 

P(|0-0|<£)>l-a, (8) 

for given e and a. We combine the MSE bounds with the so called "median trick" 
[15, 26]. One runs MCMC repeatedly and computes the median of independent 
estimates to boost the level of confidence. In our paper, the median trick is used in 
conjunction with regenerative simulation. 

The organization of the paper is the following. In Section 2 we recall the split 
chain construction. Nonasymptotic bounds for regenerative estimators defined by 
(5), (6) and (7) are derived in Section 3. Derivation of more explicit bounds which 
involve only computable quantities is deferred to Sections 5 and 6, where we con- 
sider classes of chains particularly important in the MCMC context. An analogous 



4 



Krzysztof Latuszynski, Blazej Miasojedow and Wojciech Niemiro 



analysis of the non-regenerative scheme (2) was considered in [20] and (in a differ- 
ent setting and using different methods) in [33]. 

In Section 4 we discuss the median trick. The resuhing confidence intervals are 
compared with asymptotic results based on the CLT. 

In Section 5 we consider Doeblin chains, i.e. uniformly ergodic chains that sat- 
isfy a one step minorization condition. We compare regenerative estimators (5), (6) 
and (7). Moreover, we also consider a perfect sampler available for Doeblin chains, 
c.f. [35, 14]. We show that confidence intervals based on the median trick can out- 
perform those obtained via exponential inequalities for a single run simulation. 

In Section 6 we proceed to analyze geometrically ergodic Markov chains, assum- 
ing a drift condition towards a small set. We briefly compare regenerative schemes 
(5) and (7) in this setting (the unbiased estimator (6) cannot be used, because m is 
unknown). 



2 Regenerative Simulation 

We describe the setting more precisely. Let {X„)„>o be a Markov chain with tran- 
sition kernel f on a Polish space ^ with stationary distribution n, i.e. nP = n. 
Assume P is TT-irreducible. The regeneration/split construction of Nummelin [27] 
and Athreya and Ney [2] rests on the following assumption. 

Assumption 2.1 (Small Set) There exist a Borel set 7 C j^T of positive % measure, 
a number j3 > and a probability measure v such that 

p{x,-) > j3i(xey)v(-). 

Under Assumption 2.1 we can define a bivariate Markov chain (X„,77,) on the 
space ^ X {0, 1} in the following way. Variable J7,_i depends only on X„_i via 
IP(Jn-i = 1 1'^H-i = x) ~ j3I(.x: G J). The rule of transition from {X„_i,r„_i) to X„ is 
given by 

Pix„ eA\r„_i = i,x„_i =x) = v(A), 

P(X„ gA|I;,_i =0,X„_i =x)^Q{x,A), 
where Q is the normalized "residual" kernel given by 

P{x,-)-pl{xeJ)vi-) 



l-j3l(xey) 



Whenever r„_i = 1, the chain regenerates at moment n. The regeneration epochs 
are 



7b:=min{«:i;,_i = 1}, 

Tk := min{n > r^_i : r„_i = 1}. 
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The random tours defined by 

■= {Xti,_i , ■ ■ ■ ,Xt^-u T<r), where T^. = r^. - 7i_i , (9) 

are independent. Without loss of generaUty, we assume thatXo ^ v(-), unless stated 
otherwise. Under this assumption, all the tours are i.i.d. for k > 0. We therefore 
put Tq:=0 and simplify notation. In the sequel symbols P and E without subscripts 
refer to the chain started at V. If the initial distribution £, is other than v, it will be 
explicitly indicated by writing and E ^ . Notation m — KTi stands throughout the 
paper. 

We assume that we are able to identify regeneration times T^,. Mykland et al. 
pointed out in [24] that actual sampling from Q can be avoided. We can generate the 
chain using transition probabability P and then recover the regeneration indicators 
via 

r(;;.,^i|x.,x._,,^.«,-,ey)J^. 

where v(d3') /P{x, dy) denote the Radon-Nikodym derivative (in practice, the ratio of 
densities). Mykland's trick has been established in a number of practically relevant 
families (e.g. hierarchical linear models) and specific Markov chains implementa- 
tions, such as block Gibbs samplers or variable-at-a-time chains, see [17, 25]. 



3 General results for regenerative estimators 

Recall that / : — >^ R is a measurable function and = nf. We consider block 
sums defined by (4). The general Kac theorem states that the mean occupation 

time during one tour is proportional to the stationary measure (Theorem 10.0.1 in 
[23] or Equations (3.3.4), (3.3.6), (3.4.7) and (3.5.1) in [28]). This yields 

^ EEi{f)=mnf = me. 



From now on we assume that ESi (/)^ < °° and Et^ <oo. For a discussion of 
these assumptions in the MCMC context, see [13]. Let / :^ f~ nf and define 

2 ESi(/)2 
^as(/) := , (10) 

a^:=^. (11) 
m 

Remark 1. Under Assumption 2. 1, finiteness of E Si (/)^ is a sufficient and neces- 
sary condition for the CLT to hold for Markov chain (X„)„>o and function /. This 
fact is proved in [4] in a more general setting. For our purposes it is important to 
note that (J^sif) in (10) is indeed the asymptotic variance which appears in the CLT. 
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3.1 Results for el^^ 



We are to bound the estimation error which can be expressed as follows: 
where :~ ^kif) ^ STk = Sk{f). Therefore, for any < 5 < 1, 



(12) 



> rme{l — 8) 



T,.<rm{l-8) 



Since dj^ are i.i.d. with Ec/i = and Vaxd\ = ma^^{f), we can use Chebyshev in- 
equality to bound the first term above; 



k=l 



> rme{\ - 5) < 



The second term can be bounded similarly. We use the fact that are i.i.d. with 
E Ti = m to write 

. ^2 
Tr < rm{\ - 5) < 



We conclude the above calculation with in following Theorem. 

Theorem 3.1 Under Assumption 2.1 the following holds for every < 5 < 1 



0| >e) < — 
rm 



and is minimized by 



5^5* 



£2(1-5)2 
^2/3 



m5^ 



(13) 



2/3 



(/)£ 



-2/3 . 2/3 ■ 



Obviously, ET, = rm is the expected length of trajectory. The main drawback of 
Theorem 3.1 is that the bound on the estimation error depends on m, which is typi- 
cally unknown. Replacing m by 1 in (13) would be highly inefficient. This fact mo- 
tivates our study of another estimator, 0^,^^'^^'^, for which we can obtain much more 
satisfactory results. We think that the derivation of better nonasymptotic bounds for 
O]!^^ (not involving m) is an open problem. 
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Recall that 0™'' can be used only when m is known and this situation is rather rare 
in MCMC applications. The analysis of 0™'' is straightforward, because it is simply 
a sum of i.i.d. random variables. In particular, we obtain the following. 

Corollary 3.2 Under Assumption 2. 1, 

E(ft™''-0)2 = P(|g™''-0| >e) < ^""^y . 

rm rm £^ 

Note that C7^^i,(/) ~ VarSi {f )/ni can be expressed as 

<^Lif) = ^Uf) + 0'ct2 + 20p(/, 1), (14) 

where p(/, 1) ;= Cov(Si (/), Si (l))/m. This follows from the simple observation 
that VarSi (/) = E (Si (/) + (ti - m))l 



3.3 Results for Of ^"'"^ 

The result below bounds the MSE and the expected number of samples used to 
compute the estimator. 

Theorem 3.3 If Assumption 2.1 holds then 

(0 E (0,f g-^^q - 0)^ < E Trm 

and 

{a) E rx(„) <n + Co, 

where 

Co := (J^ +771. 
Corollary 3.4 Under Assumption 2.1, 

E(0,fg-^'=i-0)2 < fl + ^y (15) 

n \ n J 

P(|0^'=s-^"i-0| >e) < + — V (16) 

ne-^ \ n J 

Remark 2. Note that the leading term ofj,(/)/77 in (15) is "asymptotically correct" 
in the sense that the standard fixed length estimator has MSE ^ ^Isif)/^- The 
regenerative-sequential scheme is "close to the fixed length simulation", because 
Um„^ooEr^(„)/?j = 1. 
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Proof (of Theorem 3.3). Just as in (12) we have 

where pairs {d^, z^) are i.i.d. with Edi = and Varc/i — ma^^{f). Since 7]j(„) > n, it 
follows that 



R(„) \ 2 



Since is a stopping time with respect to = C7(((ii, Ti ),..., (t/j, T^.)), we are 
in a position to apply the two Wald's identities (see Appendix). The second identity 
yields 

/«(") \^ 

E ( 52 t/J = Vart/i ER{n) = mal{f)ER{n). 

In this expression we can replace mER{n) by ET/jj,,) because of the first Wald's 
identity: 

R{n) 

E Tgf^,,^ = E ^ = Eti ER{n) = }nER{n) 

k=l 

and (i) follows. 

We now focus attention on bounding the expectation of the "overshoot" A (n) := 
'^R{n) ^ Since we assume that Xq ^ v, the cumulative sums Ti ^ Ti < T2 < . . . < 
Tit < ... form a (nondelayed) renewal process in discrete time. Let us invoke the 
following elegant theorem of Lorden [21, Theorem 1]: 

EA{n) < ET^/m. 

This inequality combined with (11) yields immediately ET^f,,) = E(n + A(n)) < 
n + o^ + m, i.e. (ii). 



4 The median trick 



This ingeneous method of constructing fixed precision MCMC algorithms was 
introduced in 1986 in [15], later used in many papers concerned with computa- 
tional complexity and further developed in [26]. We run I independent copies of 
the Markov chain. Let 0'-'' be an estimator computed in jth run. The final esti- 
mate is 9 :~ med(0f'\ . . . , 0^''). To ensure that 9 satisfies (8), we require that 

some modest level of confidence 1 — a < 
I — a. This is obtained via Chebyshev's inequality, if a bound on MSB is available. 
The well-known Chernoff 's inequality gives for odd /, 
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P(|0 - 0| > e) < i [4fl(l = iexp|^ln[4fl(l . (17) 

It is pointed out in [26] that under some assumptions there is a universal choice of 
a, which nearly minimizes the overall number of samples, a* sa 0. 1 1969. 

Let us now examine how the median trick works in conjunction with regenerative 
MCMC. We focus on On^^'^"'^, because Corollary 3.4 gives the best available bound 
on MSB. We first choose « such that the right hand side of (16) is less than or equal 
to a*. Then choose / big enough to make the right hand side of (17) (with a = a*) 
less than or equal to a. Compute estimator 9^^^'^^'^ repeatedly, using I independent 
runs of the chain. We can see that (8) holds if 

„>^4^+Co, (18) 
/ > C2ln(2a)"' and/isodd, (19) 

where d := 1/a* w 8.3549 and C2 := 2/ln[4fl*(l -a*)]"' « 2.3147 are absolute 
constants. Indeed, (18) entails Cia^^{f)/{£^n) < 1 -Co/«, so Cia^s{f)/{£^n){l + 
Co/n) < 1 —C(j/n^ < 1. Consequently (7^s(/)/(£^«)(l +Co/«) < a* and we are in 
a position to apply (16). 

The overall (expected) number of generated samples is IK Tui^j^-^ ^ «Z as e — > 
and n 00, by Theorem 3.3 (ii). Consequently for e the cost of the algorithm 
is approximately 

nl ^ c'^^^\og{2a)-\ (20) 

where C = C1C2 ~ 19.34. To see how tight is the obtained lower bound, let us com- 
pare (20) with the familiar asymptotic approximation, based on the CLT. Consider 
an estimator based on one MCMC run of length n, say 0„ = d^^ with t = 0. From 
(3) we infer that 

lim P(|0„-0| >£) = «, 
£^0 

holds for 

n^^^[<p-\l-a/2)\\ (21) 

where is the quantile function of the standard normal distribution. Taking into 
account the fact that [^""^(1 — oi,/2)Y ^ 21og(2a)"' for a — )- we arrive at the 
following conclusion. The right hand side of (20) is bigger than (21) roughly by a 
constant factor of about 10 (for small e and a). The important difference is that (20) 
is sufficient for an exact confidence interval while (21) only for an asymptotic one. 
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5 Doeblin Chains 



Assume that the transition kernel P satisfies the following Doeblin condition: there 
exist j3 > and a probability measure v such that 

P{x,-)>^v{-) forevery x€^. (22) 

This amounts to taking J in Assumption 2.1. Condition (22) implies that 

the chain is uniformly ergodic. We refer to [31] and [23] for definition of uniform 
ergodicity and related concepts. As a consequence of the regeneration construction, 
in our present setting Tj is distributed as a geometric random variable with parameter 
/3 and therefore 

^1 .2 VarTi 1-/3 

m = E Ti = — and at. = = — — . 

p m p 

Bounds on the asymptotic variance (7^, (/) under (22) are well known. Let = nf^ 
be the stationary variance. Results in Section 5 of [4] imply that 



(^as(/) <<y"{l+ , ^ „ < -IT- (23) 




Since in [4] a more general situation is considered, which complicates the formulas, 
let us give a simple derivation of (23) under (22). By (10) and the formula (29) given 
in the Appendix, 



aiif) < ^e,/(Xo)2 + 2£e,|/(Xo)/(X,0|I(ti >/). 

The first term above is equal to a^. To bound the terms of the series, use Cauchy- 
Schwarz and the fact that, under (22), random variables Xq and Ti are indepen- 
dent. Therefore E^\f{Xo)f{XiMzi > i) < {Ej{XifEj{XofF^{Ti > i))'^^ = 
(7^(1 — j3)'/^. Computing the sum of the geometric series yields (23). 

If the chain is reversible, there is a better bound than (23). We can use the well- 
known formula for (y^^{f) in terms of the spectral decomposition of P (e.g. ex- 
pression "C" in [11]). Results of [30] show that the spectrum of P is a subset of 
[— 1 + j3, 1 — /3]. We conclude that for reversible Doeblin chains, 

<yi{f) < ^(^2 < — . (24) 

An important class of reversible chains are Independence MetropoUs-Hastings 
chains (see e.g. [31]) that are known to be uniformly ergodic if and only if the rejec- 
tion probability r{x) is uniformly bounded from 1 by say 1 — j3. This is equivalent 
to the candidate distribution being bounded below by j5n (c.f. [22, 1]) and translates 
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into (22) with v = %. The formula for 0^^{f) in (23) and (24) depends on /3 in an 
optimal way. Moreover (24) is sharp. To see this consider the following example. 

Example 5.1 Let j3 < 1/2 and define a Markov chain {X„)n>Q on ^ ~ {0, 1} with 
stationary distribution 7t ~ [I /2, 1 /2] and transition matrix 



P = 



l-j3/2 j3/2 
i3/2 l-j3/2 



Hence P = jSTT + (1 — j3)/2 one/ P{x, ■) > P^t. Note that the residual kernel Q is in 
our example the identity matrix 12- Thus, before the first regeneration Ti the chain 
does not move. Let f(x) = x. Thus C7^ = 1/4. To compute CS^^{f) we use another 
well-known formula (expression "B" in [11]): 

<yUf) = ct2 + 2£cov,{/(Xo),/(X,)} 

i=i 

= ct2 + 2ct2£(1 -i3y = ^a^ 



To compute OjJnbl/)' ""^^ ^^'^^ •^lif) ~ -'^(^O = 1)^1- Since Ti is independent of Xq 
andXo ^ v — 7Z we obtain 

C7u'nb(/) =i8VarSi(/) = [E Var(Si (/)|Xo) + VarE (S, (/)|Xo)] 

2/3 4j3 /3 

Interestingly, in this example O^^^^ (/) > (X^j (/). 

In the setting of this Section, we will now compare upper bounds on the total 
simulation effort needed for different MCMC schemes to get P(| — 1 > e) < a. 



5.1 Regenerative-sequential estimator and the median trick 

Recall that this simulation scheme consists of I MCMC runs, each of approximate 
length n. Substituting either (23) or (24) in (20) we obtain that the expected number 
of samples is 

nZ- 19.34^1og(2a)-' and n/ - 19.34^^— ^^^log(2a)'' (25) 

(respectively in the general case and for reversible chains). Note also that in the 
setting of this Section we have an exact expression for the constant Co in Theorem 
3.3. Indeed, Co = 2/j3-l. 



12 Ki'zysztof Latuszyriski, Blazej Miasojedow and Wojciech Niemiro 

5.2 Standard one-run average and exponential inequalty 



For uniformly ergodic chains a direct comparison of our approach to exponential 
inequalities [10, 18] is possible. We focus on the result proved in [18] for chains 
on a countable state space. This inequality is tight in the sense that it reduces to 
the Hoeffding bound when specialised to the i.i.d. case. For / bounded let ||/||oo := 
sup^.g J l/C'^) I- Consider the simple average over n Markov chain samples, say 0„ = 
d^^ with f = 0. For an arbitrary initial distribution E, we have 

P,(|g„-e|>s)<2exp|-!i^(^s--^)'}. 

After identifying the leading terms we can see that to make the right hand side less 
than a we need 

||f||2 2C72 

"~i|l^log(«/2)"'>^log(«/2)"'- (26) 

Comparing (25) with (26) yields a ratio of roughly 40j3 or 20j3 respectively. This in 
particular indicates that the dependence on j3 in [10, 18] probably can be improved. 
We note that in examples of practical interest j3 usually decays exponentially with 
the dimension of ^ and using the regenerative-sequential-median scheme will of- 
ten result in a lower total simulation cost. Moreover, this approach is valid for an 
unbounded target function /, in contrast with classical exponential inequalities. 



5.3 Perfect sampler and the median trick 

For Doeblin chains, if regeneration times can be identified, perfect sampling can 
be performed easily as a version of read-once algorithm [35]. This is due to the 
following observation. If condition (22) holds and Xq^ v then 

Xt,-i, ^-1,2,... 

are i.i.d. random variables from n (see [5, 28, 14, 4] for versions of this result). 
Therefore from each random tour between regeneration times one can obtain a sin- 
gle perfect sample (by taking the state of the chain prior to regeneration) and use it 
for i.i.d. estimation. We define 

k=l 

Clearly 
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E(ef'^^-0)2 = — and Pdef-^-ei >e) < 

r re- 

Note that to compute Of^^^ we need to simulate n ^ r//3 steps of the Markov chain. 
If we combine the perfect sampler with the median trick we obtain an algorithm 
with the expected number of samples 

19.34^ log(2a)-i. (27) 

Comparing (25) with (26) and (27) leads to the conclusion that if one targets rigor- 
ous nonasymptotic results in the Doeblin chain setting, the approach described here 
outperforms other methods. 



5.4 Remarks on other schemes 

The bound for Q]-^^ in Theorem 3.1 is clearly inferior to that for Q^°~^^'^ in Corollary 
3.4. Therefore we excluded the scheme based on dl^^ from our comparisons. 

As for 0™^, this estimator can be used in the Doeblin chains setting, because 
m = l/j3 is known. The bounds for 0™'' in Subsection 3.2 involve <y^^^{f). Although 
we cannot provide a rigorous proof, we conjecture that in most practical situations 
we have (y^^^{f) > CTasif)^ because p(/, 1) in (14) is often close to zero. If this is 
the case, then the bound for 0™*' is inferior to that for On^^'^'"^. 



6 A Geometric Drift Condition 

Using drift conditions is a standard approach for establishing geometric ergodicity. 
We refer to [31] or [23] for the definition and further details. The assumption below 
is the same as in [3]. Specifically, let J be the small set which appears in Assumption 
2.1. 

Assumption 6.1 (Drift) There exist a function V : ^ ^ constants A < 1 

and K < °° such that 



PV{x) := / P{x,dy)V{y) < 

JSC 



XV {x) forx(^J, 
K for X G J, 



In many papers conditions similar to Assumption 6.1 have been established for re- 
ahstic MCMC algorithms in statistical models of practical relevance [12, 7, 8, 16, 
17, 34]. This opens the possibility of computing our bounds in these models. 
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Under Assumption 6.1, it is possible to bound (J^^if), (J^ and Co which appear 
in Theorems 3.1 and 3.3, by expressions involving only A, j3 and K. The following 
result is a minor variation of Theorem 6.5 in [19]. 

Theorem 6.2 If Assumptions 2.1 and 6.1 hold and f is such that WfWyi/i ■= 
sup_^|/(jt:)|/V'/^(x) < oo, then 



oUf) < ll/l 



l+Ai/2 2(/ri/2_A'/2_j3(2-Ai/2)) ^ 



Co<-^T7?^(^ ^ ) + ■ 



l-A'/2 ^ ^ j3(l-Ai/2) 

To bound we can use the obvious inequality = Cq — m < Cq — I. Moreover, 
one can easily control ttV and 7ry'/2 and further replace ||/||yi/2 e.g. by ||/|iyi/2 + 
_ ;t^/2)/(l - A V2), we refer to [19] for details. 
Let us now discuss possible approaches to confidence estimation in the setting 
of this section. Perfect sampling is in general unavailable. For unbounded / we 
cannot apply exponential inequalities for the standard one-run estimate. Since m 
is unknown we cannot use 0™''. This leaves 9r'^^ and O^^^ '^^'^ combined with the 
median trick. To analyse Or'~^ we can apply Theorem 3.1. Upper bounds for (J^^{f) 
and (7^ are available. However, in Theorem 3.1 we will also need a lower bound on 
m. Without further assumptions we can only write 

^ > 4- (28) 



n{J)l5- p- 

In the above analysis (28) is particularly disappointing. It multiplies the bound by 
an unexpected and substantial factor, as 7z(J) is typically small in appUcations. For 
gieg-seq jjave much more satisfactory results. Theorems 3.3 and 6.2 can be used to 
obtain bounds which do not involve m. In many realistic examples, the parameters 
/3, A and K which appear in Assumptions 2.1 (Small Set) and 6.1 (Drift) can be 
explicitly computed, see e.g. [16, 17, 34]. 

We note that nonasymptotic confidence intervals for MCMC estimators under 
drift condition have also been obtained in [20], where identification of regeneration 
times has not been assumed. In absence of regeneration times a different approach 
has been used and the bounds are typically weaker For example one can compare 
[20, Corollary 3.2] (for estimator 0/\^) combined with the bounds in [3] with our 
Theorems 3.3 and 6.2 (for estimator ^'^'"'). 
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Appendix 

For convenience, we recall the two identities of Abraham Wald, which we need 
in the proof of Theorem 3.3. Proofs can be found e.g. in [6, Theorems 1 and 3 in 
Section 5.3]. 

Assume that 771 , . . . , 7]^., . . ., are i.i.d. random variables and is a stopping time 
such that E/? < 00. 

I Wald identity: IfE|T7i|<oothen 

R 

k=\ 

II Wald identity: If E 771 = and E 77^ < oc then 



R 



E ^ 77J =E/?E77 



In Section 5 we used the following formula taken from [28, Equation (4.1.4)]. In 
the notation of our Sections 2 and 3, for every g > we have 

^l^^^ML ^E,g{X^,f + 2Y^EM)8{miT > i). (29) 

In [28] this formula, with g ~ f, is used to derive an expression for the asymptotic 
variance C7^s(/) = E ySi (/)/m under the assumption that / is bounded. For g >0, 
the proof is the same. 



